Phonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system
نویسندگان
چکیده
Recently, corpus-based text-to-speech (CB-TTS) has been actively studied through the world. Statistical training methods are generally applied for prosodic rules in CB-TTS, and classification and regression tree (CART) is one of the mostly used methods. In this paper, we present an efficient CART training approach of zscore based phonetic normalization. The idea of ours comes from the fact that the most important three parameters of CART training for segmental prosody are phone and its right and left phones, especially in Korean language. Our approach reduces the number of CART terminal nodes effectively. The reduction ratios are approximately 14-94% for estimation of segmental duration and 45-70% for intensity estimation. Also, the experimental results show that phonetic normalization slightly lessens the estimation errors.
منابع مشابه
Implementing an SSML compliant concatenative TTS system
The W3C Speech Synthesis Markup Language (SSML) unifies a number of recent related markup languages that have emerged to fill the perceived need for increased, and standardized, user control over Text to Speech (TTS) engines. One of the main drivers for markup has been the increasing use of TTS engines as embedded components of specific applications – which means they are in a position to take ...
متن کاملUnsupervised prosody labeling for constructing Mandarin TTS
This paper introduces an unsupervised prosody labeling method for preparing a large speech corpus used in developing a Mandarin Text-to-Speech system. Adopting a four-layer prosody hierarchy, the proposed method performs an unsupervised segmental clustering that iteratively segments spoken utterances into strings of prosodic constituents and models the patterns of the segmented prosodic constit...
متن کاملSegment selection in the L&h Realspeak laboratory TTS system
The L&H RealSpeak Laboratory TTS (RSLab) system is a corpus based speech synthesis system comprising components that deal with linguistic processing, prosody prediction, segment selection, concatenation and modification. In this paper we focus on the segment selection process. During segment selection, the units in a large database of speech are scored with a cost according to their prosodic/ph...
متن کاملA Quantitative Study on Information Contribution of Prosody Phrase Boundaries in Chinese Speech
In speech, acoustic cues are used to manifest a number of linguistic events including segmental phonemes and supra-segmental ones such as tones, prosodic phrasing structure, intonation, etc. It has been an interesting topic to quantitatively compare the importance of different linguistic events. However, previous studies have been mainly confined to segmental or segment-like units. No studies c...
متن کاملA Corpus-Based Concatenative Speech Synthesis System for Turkish
Speech synthesis is the process of converting written text into machine-generated synthetic speech. Concatenative speech synthesis systems form utterances by concatenating pre-recorded speech units. Corpus-based methods use a large inventory to select the units to be concatenated. In this paper, we design and develop an intelligible and natural sounding corpus-based concatenative speech synthes...
متن کامل